%We begin our investigation of subcategorization in biomedicine by
%looking at two different definitions of
%subcategorization. 

The traditional linguistic notion of subcategorization refers to the
syntactic arguments of a verb, that is, the syntactic phrase types
which occur obligatorily or with high probability for any given verb.
Some common syntactic phrase types which can serve as
arguments to a verb include noun phrases, prepositional phrases,
subordinate clauses, adjectives and adverbs.

Some basic examples of subcategorization frames (SCFs) can be seen in
Table~\ref{t:basic}. For the SCF names we use COMLEX Syntax notation
\citep{grishman:94}, which includes an abbreviation for each phrase type in the SCF. Thus the SCF for a transitive verb (taking one direct object noun phrase) is NP, and for a verb taking a direct object and a prepositional phrase NP-PP.  Note that we do not specify the subject NP as part of the SCF, since subjects are obligatory in English. Most verbs take several SCFs. In Table~\ref{t:basic}, it can be seen that {\it decrease} may occur with the following SCFs: NP, NP-PP, or $\oslash$ (intransitive). On the other hand, {\it compare} occurs with the first two but not as an intransitive.

\begin{table}
\begin{tabular}{|l|p{0.8\columnwidth}|}
\hline
SCF & Example \\
\hline
\hline
NP & The retraction screw and blade \underline{decreased} [$_{NP}$the risks of vessel injuries]. \\
NP-PP & Heterozygosity for twine also \underline{decreases} [$_{NP}$the frequency of precocious NEB] [$_{PP}$to less than 10\%]. \\
$\oslash$ & The contribution of cardiovascular diseases as cause of death \underline{decreased}. \\
\hline
NP & We \underline{compared} [$_{NP}$the performance of the Charlson and the Elixhauser comorbidity measures]. \\
NP-PP & We \underline{compared} [$_{NP}$the predictions] [$_{PP}$to the known interaction signs]. \\
$*$ $\oslash$ & $*$ We \underline{compared}. \\
\hline
\end{tabular}
\caption{Sample SCFs for {\it describe} and {\it compare}. Note that {\it compare} does not occur as an intransitive, represented by the asterisk. All examples adapted from the PMC OA corpus.}
\label{t:basic}
\end{table}


% 49: Anti-EP , anti-R67 and anti-R80 decreased virus binding by up to 40 % .
% 49: Treatment of cells with both IL-6 and TGF-Î²1 decreased the percentage of cells with sub-G1 DNA content to 6.7 %

Additional examples of SCFs are shown in
Table~\ref{t:complex_scfs}. Here the COMLEX SCF names include
mnemonics for some additional information beyond the simple phrasal
types. For example, the frame NP-AS-NP is a subclass of NP-PP, where
the preposition is lexicalized as {\it as}. The frame NP-TOBE
represents a direct object and a predicate using {\it to be}. The
frame THAT-S represents a sentential complement introduced by the
complementizer {\it that}, and TO-INF is an infinitival complement that uses the {\it to} form of the verb in the lower clause.

% Some examples of {\it subcategorization frames} (SCFs) are given here. For
% example, {\it decrease} may occur in a transitive frame, i.e.~with a single
% noun phrase (NP) complement \ref{ex:decrease-24}; with both an NP
% and a prepositional phrase (PP) \ref{ex:decrease-49}; or as
% an intransitive, i.e.~with no complements \ref{ex:decrease-22}. On the
% other hand, {\it compare} can appear in a transitive frame
% \ref{ex:compare-24} or with an NP and a PP \ref{ex:compare-49}, but
% not as an intransitive \ref{ex:compare-22}. (All examples adapted from the PMC OA corpus).



% % [TOM: WE NEED BIO EXAMPLES OF MORE ELABORATE SCFs .  See the commented out general language examples
% just below here: see if you can find some corresponding bio examples (they don't have to be from our actual corpus, you can just find them on the web)]


% Some examples of SCFs in general language are given
% in \ref{ex:genlg}. In \ref{ex:genlg-a}, the NP-as-NP frame represents
% one NP argument, and one PP containing the preposition {\it as} and
% its NP object. In \ref{ex:genlg-b}, the NP-TOBE frame represents one
% NP argument, and one clausal argument beginning with {\it to
% be}. In \ref{ex:genlg-c}, the NP-PRED-RS frame represents a single NP
% argument, which is interpreted as a predicate of the subject, and the
% verb is a so-called {\it raising} verb (i.e. the subject behaves as
% though it originates within the predicate).  And in \ref{ex:genlg-d},
% there is one NP and two PPs, all of which are strongly selected by the verb.

% \ex.\label{ex:genlg}\a.\label{ex:genlg-a} NP-as-NP: I sent him as a messenger.
% \b.\label{ex:genlg-b} NP-TOBE: I found her to be a good teacher.
% \c.\label{ex:genlg-c} NP-PRED-RS: He seemed a fool.
% \d. \label{ex:genlg-d} NP-PP-PP: She turned it from a disaster into a victory.

\begin{table}
\begin{tabular}{|l|p{0.75\columnwidth}|}
\hline
SCF & Example \\
\hline
\hline
NP-AS-NP & Perception of complex stimuli occurs too rapidly to \underline{support} rate coding as a reliable mechanism. \\
NP-TOBE & The larger, unsaturated propyne group has been \underline{shown} to be a useful modification for antisense oligonucleotides. \\
PP-PP & Threshold values \underline{ranged} from 0.01 to 0.99. \\
THAT-S & Experiments with PTEN-null PGCs in culture \underline{revealed} that these cells had greater proliferative capacity.\\
TO-INF & Administration of DA agonists to the rat PFC \underline{acts} to enhance working memory in these animals.\\
\hline
\end{tabular}
\caption{Sample SCFs. All examples adapted from the PMC OA corpus.}
\label{t:complex_scfs}
\end{table}


% NP-AS
% We used eszopiclone as a pharmacological tool to improve the quality of sleep in old age.
% The stroke was classified as a TIA.
% Perception of complex stimuli occurs too rapidly to support rate coding as a reliable mechanism.
% The study used bortezomib as a therapeutic agent in patients with metastatic kidney cancer.
% This technique allows us to treat the ECM as a mesh-like shell around each cell.

% NP-TOBE
% BT was known to be a disease occurring especially in sheep population causing typical clinical symptoms.
% The larger, unsaturated propyne group has been shown to be a useful modification for antisense oligonucleotides.
% This technique has been proven to be a promising method in treating a number of diseases.
% S. cerevisiae has proven to be a valuable experimental system.
% Analysis of the genome of Thermotoga maritima detected selected codon usage in highly expressed genes, but found this to be a relatively minor source of variation among genes.

% (NP-)PP-PP
% Substrate modification improved the clinical outcome from 67\% to 86\%.
% The mean DAI-10 scores were reduced in the abrupt group from baseline to endpoint.
% Threshold values ranged from 0.01 to 0.99.
% Seventy-three samples analyzed for K from 5 to 7 m depths, averaged 1.48 g/lit, and ranged from 1.4 to 1.49 g/lit.

% THAT-S
% Experiments with PTEN-null PGCs in culture revealed that these cells had greater proliferative capacity.
% The corresponding sagittal T1-weighted contrast enhanced image, revealed that the non-fatty component does not show any apparent enhancement.
% Previous studies have revealed that suppression of the SA signaling pathway enhances the activation of JA-induced responses.

% TO-INF
% Administration of DA agonists to the rat PFC acts to enhance working memory in these animals.
% Activation of endothelin receptors in the kidney leads to constriction of renal vessels.
% Alignments enhance the understanding of structure-function relationships by allowing common functional and structural regions in protein families to be identified.

% /auto/userfiles/tl318/dynamic_html/public_html/subcat_annotation_bk/tasks/scf_task.xml

Comparing SCFs to another argument structure representation sometimes
used in biomedicine, SCFs are more general than Predicate-Argument
Structures (PASs), which have been used in Semantic Role
Labeling \citep{wattarujeekrit:04,tsai:05,tsai:08}. PASs include very
specific per-verb roles such as, for the verb {\it delete}, ``entity
doing the removing'', ``thing being removed'', and ``removed
from''. SCFs also do not identify thematic roles such as Agent and
Patient nor functional roles such as Subject and Object (though these
types of roles can often be inferred from the SCF), but simply the
syntactic phrase types that are selected by the verb (NP, PP,
etc.). SCFs thus provide a basic level of argument structure information which can
aid in event identification, but are general enough to be
automatically acquired for a large number of verbs, compared to PASs
which must be defined on a per-verb basis and thus can only
practically be identified for a small number of very frequent
biomedical verbs.

An important notion for subcategorization is that of the {\it
argument-adjunct} distinction, with the linguistic notion of subcategorization -- and the one typically used in general language -- involving only arguments.  The hallmark of a syntactic {\it argument}
is that it is obligatory or very strongly selected by the
verb.\footnote{Recall, however, that most verbs take multiple SCFs
which may involve different obligatory arguments. Therefore, the
argument is properly considered to be obligatory with regard to the
verb-SCF pair, not just the verb.} Arguments are distinguished from
{\it adjuncts}, which are phrases that elaborate on an event and are
generally optional. This distinction is often relevant for classifying
prepositional phrases.
% comes into play most frequently in the case of PPs. 
In particular, PPs describing location, manner, or
time tend to be adjuncts. 
% For example, 
% In \ref{ex:argadj-a}, 
In Figure~\ref{f:argadj}, the PP
{\it on Sunday} is 
% entirely 
optional, elaborating on the event description by describing the time at
which the cooking event 
% described by the verb {\it cook} 
took place. 
% In \ref{ex:argadj-b}, 
The PP {\it on the patient} is obligatory
and 
% takes 
exhibits a special, idiomatic meaning in the context of the verb {\it operate}.
The argument-adjunct distinction is sometimes fuzzy, because the judgement of optionality can be difficult to make, especially when a phrase type occurs with high frequency for a given verb. However, Figure~\ref{f:argadj} illustrates another criterion, namely that the meaning of arguments often depends on the particular verb, while
% The argument-adjunct distinction is sometimes fuzzy, but besides the criterion of being syntactically obligatory, another way to identify an argument is that its meaning is dependent on the particular 
% verbal head
% verb, as in \ref{ex:argadj-b}, while 
% arguments can be identified the meaning of an
% argument is more dependent on the meaning of the verbal head, while
adjuncts maintain their interpretation (e.g.~temporal, locative, manner) across a wide variety of verbal
heads \citep{grimshaw:90,pollard:87}. 
% Adjuncts with similar meanings
% (e.g.~temporal) can occur with a wide range of 
% % verbal heads
% verbs, while the
% same is not true for arguments that receive a specific interpretation
% when used with specific verbs \citep{pollard:87}. 
See
\citep{merlo:06,abend:10} for computational approaches to the argument-adjunct distinction.

\begin{figure}
\begin{tabular}{|l|}
\hline
ADJUNCT:\\ 
The chef cooked a good lunch on Sunday.\\[5pt]
ARGUMENT: \\
The surgeon operated on the patient. \\
\hline
\end{tabular}
\caption{Example adjunct and argument PPs.}
\label{f:argadj}
\end{figure}

% \ex.\label{ex:argadj}\a.\label{ex:argadj-a} ADJUNCT: The chef cooked a good lunch on Sunday.
% \b.\label{ex:argadj-b} ARGUMENT: The surgeon operated on the patient.



In biomedicine, subcategorization 
% however, it is often the case that subcategorization is defined more
is often defined more broadly,
to include adjuncts that are less strongly
selected but 
% are considered 
nevertheless
important for 
% a minimal 
the complete
description of an
event,
especially
 from the point of view of Information Extraction. \citep{cohen:06} state that ``knowledge representation in this
      [biomedical] domain requires that we {\it not} make a
      distinction between adjuncts and core arguments''. As they note,
      the tradeoff is 
% that you lose the similarity of adjuncts across verbs, 
a loss of some ability to generalize about adjuncts across verbs,
but they argue that
      this loss is outweighed by the ``biological integrity in the
      knowledge representation''.
Within a PAS
annotation scheme, for example, \citep{wattarujeekrit:04} includes the
location PP in \ref{ex:loc} and the manner adverb in \ref{ex:manner}
as core arguments, neither of which would be considered arguments in general
language. Note that even under the broader definition, not every phrase type that co-occurs with the verb is an argument; \cite{wattarujeekrit:04} still consider aspectual or frequency adverbs such as {\it
  still} or {\it always} to be adjuncts.

\ex.\label{ex:loc} Apparently HeLa cells either initiate transcription \underline{at multiple sites within RPS14 exon 1} . . . \citep{wattarujeekrit:04}

\ex.\label{ex:manner} Mice have previously been shown to develop \underline{normally} . . . \citep{wattarujeekrit:04}


We will call these two views on subcategorization the ``syntactic''
view, which emphasizes syntactic obligatoriness, and the ``semantic''
view, which emphasizes a complete semantic description of an
event. Note that it is possible to use the same inventory of SCFs for
annotation on either view; the difference will be that more complex frames, e.g.~those involving PPs and adverbs, will be used more frequently on the 
semantic view. 
% From a statistical point of view, 
Put another way, the semantic view
implies using a lower 
% cutoff of evidence 
co-occurrence frequency threshold for considering a phrase to
be an argument of a verb; \citep{wattarujeekrit:04} considers a frequency of 20\% relative to the predicate to be sufficient, for
example.  In this paper we perform the first investigation of SCF
acquisition 
% with both types of gold standards.
that explicitly compares the two definitions of subcategorization.



